AITopics | bias measure

Collaborating Authors

bias measure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures

Patel, Tanvina, Hutiri, Wiebke, Ding, Aaron Yi, Scharenborg, Odette

arXiv.org Artificial IntelligenceJul-9-2025

There is increasingly more evidence that automatic speech recognition (ASR) systems are biased against different speakers and speaker groups, e.g., due to gender, age, or accent. Research on bias in ASR has so far primarily focused on detecting and quantifying bias, and developing mitigation approaches. Despite this progress, the open question is how to measure the performance and bias of a system. In this study, we compare different performance and bias measures, from literature and proposed, to evaluate state-of-the-art end-to-end ASR systems for Dutch. Our experiments use several bias mitigation strategies to address bias against different speaker groups. The findings reveal that averaged error rates, a standard in ASR research, alone is not sufficient and should be supplemented by other measures. The paper ends with recommendations for reporting ASR performance and bias to better represent a system's performance for diverse speaker groups, and overall system bias.

artificial intelligence, speech, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2507.05885

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models

Faulborn, Mats, Sen, Indira, Pellert, Max, Spitz, Andreas, Garcia, David

arXiv.org Artificial IntelligenceMar-20-2025

Prompt-based language models like GPT4 and LLaMa have been used for a wide variety of use cases such as simulating agents, searching for information, or for content analysis. For all of these applications and others, political biases in these models can affect their performance. Several researchers have attempted to study political bias in language models using evaluation suites based on surveys, such as the Political Compass Test (PCT), often finding a particular leaning favored by these models. However, there is some variation in the exact prompting techniques, leading to diverging findings and most research relies on constrained-answer settings to extract model responses. Moreover, the Political Compass Test is not a scientifically valid survey instrument. In this work, we contribute a political bias measured informed by political science theory, building on survey design principles to test a wide variety of input prompts, while taking into account prompt sensitivity. We then prompt 11 different open and commercial models, differentiating between instruction-tuned and non-instruction-tuned models, and automatically classify their political stances from 88,110 responses. Leveraging this dataset, we compute political bias profiles across different prompt variations and find that while PCT exaggerates bias in certain models like GPT3.5, measures of political bias are often unstable, but generally more left-leaning for instruction-tuned models.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.16148

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Spain (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Principled Approach for a New Bias Measure

Scarone, Bruno, Viola, Alfredo, Baeza-Yates, Ricardo

arXiv.org Artificial IntelligenceMay-20-2024

The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. The areas in which this is happening are diverse: healthcare, employment, finance, education, the legal system to name a few; and the associated negative side effects are being increasingly harmful for society. Negative data \emph{bias} is one of those, which tends to result in harmful consequences for specific groups of people. Any mitigation strategy or effective policy that addresses the negative consequences of bias must start with awareness that bias exists, together with a way to understand and quantify it. However, there is a lack of consensus on how to measure data bias and oftentimes the intended meaning is context dependent and not uniform within the research community. The main contributions of our work are: (1) a general algorithmic framework for defining and efficiently quantifying the bias level of a dataset with respect to a protected group; and (2) the definition of a new bias measure. Our results are experimentally validated using nine publicly available datasets and theoretically analyzed, which provide novel insights about the problem. Based on our approach, we also derive a bias mitigation algorithm that might be useful to policymakers.

algorithm, dataset, tuple, (11 more...)

arXiv.org Artificial Intelligence

2405.12312

Country:

South America > Uruguay (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Law (0.86)
Government (0.66)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Semantic Properties of cosine based bias scores for word embeddings

Schröder, Sarah, Schulz, Alexander, Hinder, Fabian, Hammer, Barbara

arXiv.org Artificial IntelligenceJan-27-2024

In the domain of Natural Language Processing (NLP), many works have investigated social biases in terms of associations in the embeddings space. Early works [1, 2] introduced methods to measure and mitigate social biases based on cosine similarity in word embeddigs. With NLP research progressing to large language models and contextualized embeddings, doubts have been raised whether these methods are still suitable for fairness evaluation [3] and other works criticize that for instance the Word Embedding Association Test (WEAT) [2] fails to detect some kinds of biases [4, 5]. Overall there exists a great deal of bias measures in the literature, which not necessarily detect the same biases [6, 4, 5]. In general, researchers are questioning the usability of model intrinsic bias measures, such as cosine based methods [7, 8, 9]. There exist few papers that compare the performance of different bias scores [10, 11] and works that evaluate experimental setups for bias measurement [12]. However, to our knowledge, only two works investigate the properties of intrinsic bias scores on a theoretical level [5, 13]. To further close this gap, we evaluate the semantic properties of cosine based bias scores, focusing on bias quantification as opposed to bias detection. We make the following contributions: (i) We formalize the properties of trustworthiness and comparability as requirements for cosine based bias scores.

bias direction, bias score, direct bias, (14 more...)

arXiv.org Artificial Intelligence

2401.15499

Country:

North America > Dominican Republic (0.04)
Europe > Germany > North Rhine-Westphalia (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Undesirable Biases in NLP: Addressing Challenges of Measurement

van der Wal, Oskar, Bachmann, Dominik, Leidinger, Alina, van Maanen, Leendert, Zuidema, Willem, Schulz, Katrin

arXiv.org Artificial IntelligenceJan-14-2024

As Large Language Models and Natural Language Processing (NLP) technology rapidly develop and spread into daily life, it becomes crucial to anticipate how their use could harm people. One problem that has received a lot of attention in recent years is that this technology has displayed harmful biases, from generating derogatory stereotypes to producing disparate outcomes for different social groups. Although a lot of effort has been invested in assessing and mitigating these biases, our methods of measuring the biases of NLP models have serious problems and it is often unclear what they actually measure. In this paper, we provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics -- a field specialized in the measurement of concepts like bias that are not directly observable. In particular, we will explore two central notions from psychometrics, the construct validity and the reliability of measurement tools, and discuss how they can be applied in the context of measuring model bias. Our goal is to provide NLP practitioners with methodological tools for designing better bias measures, and to inspire them more generally to explore tools from psychometrics when working on bias measurement tools.

bias measure, computational linguistic, validity, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1.15195

2211.13709

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(11 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
Education > Assessment & Standards (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Anantaprayoon, Panatchakorn, Kaneko, Masahiro, Okazaki, Naoaki

arXiv.org Artificial IntelligenceSep-18-2023

Discriminatory social biases, including gender biases, have been found in Pre-trained Language Models (PLMs). In Natural Language Inference (NLI), recent bias evaluation methods have observed biased inferences from the outputs of a particular label such as neutral or entailment. However, since different biased inferences can be associated with different output labels, it is inaccurate for a method to rely on one label. In this work, we propose an evaluation method that considers all labels in the NLI task. We create evaluation data and assign them into groups based on their expected biased output labels. Then, we define a bias measure based on the corresponding label output of each data group. In the experiment, we propose a meta-evaluation method for NLI bias measures, and then use it to confirm that our measure can evaluate bias more accurately than the baseline. Moreover, we show that our evaluation method is applicable to multiple languages by conducting the meta-evaluation on PLMs in three different languages: English, Japanese, and Chinese. Finally, we evaluate PLMs of each language to confirm their bias tendency. To our knowledge, we are the first to build evaluation datasets and measure the bias of PLMs from the NLI task in Japanese and Chinese.

computational linguistic, dataset, tennis, (15 more...)

arXiv.org Artificial Intelligence

2309.09697

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Tōhoku (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

Selvam, Nikil Roashan, Dev, Sunipa, Khashabi, Daniel, Khot, Tushar, Chang, Kai-Wei

arXiv.org Artificial IntelligenceJun-16-2023

How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternative constructions for a given benchmark based on innocuous modifications (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender and BiasNLI) we observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models. We hope these troubling observations motivate more robust measures of social biases.

benchmark, construction, dataset, (15 more...)

arXiv.org Artificial Intelligence

2210.1004

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > India (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Trustworthy Social Bias Measurement

Bommasani, Rishi, Liang, Percy

arXiv.org Artificial IntelligenceDec-20-2022

How do we design measures of social bias that we trust? While prior work has introduced several measures, no measure has gained widespread trust: instead, mounting evidence argues we should distrust these measures. In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling. To combat the frequently fuzzy treatment of social bias in NLP, we explicitly define social bias, grounded in principles drawn from social science research. We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures. To validate our measures, we propose a rigorous testing protocol with 8 testing criteria (e.g. predictive validity: do measures predict biases in US employment?). Through our testing, we demonstrate considerable evidence to trust our measures, showing they overcome conceptual, technical, and empirical deficiencies present in prior measures.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.11672

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Eswatini > Manzini > Manzini (0.05)
North America > United States > New York > New York County > New York City (0.04)
(20 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media (1.00)
Government (0.93)
Leisure & Entertainment > Sports (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

Berg, Hugo, Hall, Siobhan Mackenzie, Bhalgat, Yash, Yang, Wonsuk, Kirk, Hannah Rose, Shtedritski, Aleksandar, Bain, Max

arXiv.org Artificial IntelligenceOct-25-2022

Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation. To address these challenges, we investigate bias measures and apply ranking metrics for image-text representations. We then investigate debiasing methods and show that prepending learned embeddings to text queries that are jointly trained with adversarial debiasing and a contrastive loss reduces various bias measures with minimal degradation to the image-text representation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2203.11933

Country:

Africa > Eswatini > Manzini > Manzini (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

On Measures of Biases and Harms in NLP

Dev, Sunipa, Sheng, Emily, Zhao, Jieyu, Amstutz, Aubrie, Sun, Jiao, Hou, Yu, Sanseverino, Mattie, Kim, Jiin, Nishi, Akihiro, Peng, Nanyun, Chang, Kai-Wei

arXiv.org Artificial IntelligenceOct-13-2022

Recent studies show that Natural Language Processing (NLP) technologies propagate societal biases about demographic groups associated with attributes such as gender, race, and nationality. To create interventions and mitigate these biases and associated harms, it is vital to be able to detect and measure such biases. While existing works propose bias evaluation and mitigation methods for various tasks, there remains a need to cohesively understand the biases and the specific harms they measure, and how different measures compare with each other. To address this gap, this work presents a practical framework of harms and a series of questions that practitioners can answer to guide the development of bias measures. As a validation of our framework and documentation questions, we also present several case studies of how existing bias measures in NLP -- both intrinsic measures of bias in representations and extrinsic measures of bias of downstream applications -- can be aligned with different harms and how our proposed documentation questions facilitates more holistic understanding of what bias measures are measuring.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2108.03362

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback